首页> 外文OA文献 >DiNoDB: an Interactive-speed Query Engine for Ad-hoc Queries on Temporary Data
【2h】

DiNoDB: an Interactive-speed Query Engine for Ad-hoc Queries on Temporary Data

机译:DiNoDB:一种用于ad-hoc查询的交互式速度查询引擎   临时数据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

As data sets grow in size, analytics applications struggle to get instantinsight into large datasets. Modern applications involve heavy batch processingjobs over large volumes of data and at the same time require efficient ad-hocinteractive analytics on temporary data. Existing solutions, however, typicallyfocus on one of these two aspects, largely ignoring the need for synergybetween the two. Consequently, interactive queries need to re-iterate costlypasses through the entire dataset (e.g., data loading) that may providemeaningful return on investment only when data is queried over a long period oftime. In this paper, we propose DiNoDB, an interactive-speed query engine forad-hoc queries on temporary data. DiNoDB avoids the expensive loading andtransformation phase that characterizes both traditional RDBMSs and currentinteractive analytics solutions. It is tailored to modern workflows found inmachine learning and data exploration use cases, which often involve iterationsof cycles of batch and interactive analytics on data that is typically usefulfor a narrow processing window. The key innovation of DiNoDB is to piggyback onthe batch processing phase the creation of metadata that DiNoDB exploits toexpedite the interactive queries. Our experimental analysis demonstrates thatDiNoDB achieves very good performance for a wide range of ad-hoc queriescompared to alternatives %such as Hive, Stado, SparkSQL and Impala.
机译:随着数据集规模的增长,分析应用程序难以对大型数据集进行即时洞察。现代应用程序需要对大量数据进行繁重的批处理工作,同时还需要对临时数据进行高效的临时交互分析。然而,现有的解决方案通常集中于这两个方面之一,在很大程度上忽略了两者之间协同作用的需要。因此,交互式查询需要重新遍历整个数据集的昂贵传递(例如,数据加载),只有当长时间查询数据时才可能提供有意义的投资回报。在本文中,我们提出了DiNoDB,这是一种用于临时数据临时查询的交互式速度查询引擎。 DiNoDB避免了传统的RDBMS和当前的交互式分析解决方案所特有的昂贵的加载和转换阶段。它针对机器学习和数据探索用例中发现的现代工作流而量身定制,这些工作流通常涉及批处理和交互式数据分析周期的迭代,这通常对于狭窄的处理窗口很有用。 DiNoDB的关键创新是在批处理阶段背负DiNoDB用来加速交互式查询的元数据的创建。我们的实验分析表明,与Noive替代方案(例如Hive,Stado,SparkSQL和Impala)相比,DiNoDB在各种即席查询中都具有​​非常好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号